sequences 



Elimination of 



undesired sequences ^ 



Masked 
sequences 



Clusterization 




Identification 
of variants 




Determination of 
consensus sequences 

Extension of consensus 
sequences 



Identification 
of Orfs 



Clusters 2nd 
singletons 



Tonti gated 
sequences 



consensus 
sequences 



ORFs 



/ ft 



Minimum 
signal 
peptide score 


false positive 
rate 


false 
negative rate 


proba(0.1) 


proba(0.2) 




0.121 


0,036 


0.467 


0.664 


4 


0,096 


0,06 


0,519 


0.708 


4.5 


0,078 


0,079 


0,565 


0.745 


5 


0,062 


0.098 


0.615 


0.782 


5.5 


0.05 


0.127 


0,659 


0.813 


6 


0.04 


0.163 


0,694 


0.836 


6.5 


0,033 


0,202 


0.725 


0,855 


7 


0,025 


0,248 


0,763 


0.878 


7,5 


0,021 


0,304 


0.78 


-0.889 


6 


0,015 


0.368 


0,816 


0.909 


8.5 


0,012 


0,418 


0,836 


0,92 


9 


0.009 


0.512 


0.856 


0.93 


9.5 


0,007 


0.581 


0,863 


0.934 


10 


0.006 


0.679 


0,835 


0.919 



f'l 



AAA 



AAA 



Reverse 
transcription 



Nested PCR 



If 




Direct 
Cloning 




13 



Design of 5' 
nested primers 



Primer 
walking and 
ORF isolation 

Design of new 
primers 

PCR 



Cloning 



5 ' EST 
ORF 

Signal peptide 
PCR primers 
-AAA mRNA 



TTT — l r. sirzsid. cDN'A 
— TTT 



PCR product 




Clorvc 



Description of Transc 
SignalTag sequences 



nsc^non 



Factor Binding Sites presenAn promoters isolated from 



Promoter sequence P13H2 (546 bp): 

Matrix Position Orientation 



CMYB_01 
MYODQ6 
S8_0l 
S8_01 

DELTAEF1_01 

GATA_C 

CMYB_01 

GATA1_02 

GATAC 

TAL IALPHAE47 0 1 

TAL IBETAE470 1 

TAL 1BETAITF2_0 1 

MYOD_Q6 

GATA1_04 

IK1_01 

DC2_01 

CREL_01 

GATA1_02 

SRY_02 

E2F_02 

MZF1 01 



Promoter sequence P15B4 (861 bp): 

Matrix Position Orientation 



Score Length Sequence 



-502 


+ 


O.Voi 


Q 

y 


-501 


- 


A O/Cl 

u.yoi 




-444 


- 




1 1 
1 1 


-425 


+ 




1 1 
1 1 


-390 






1 1 

1 1 


-364 


— 


A QfiA 


1 1 

1 1 


-349 


+ 




Q 


-343 


+ 




i*» 


-339 






1 1 
1 1 


-235 


+ 


A OT^ 


1 o 


-235 


+ 




1 f 


-235 






1 fx 


-151 






10 


-217 




0.953 




426 


+ 


0.963 




-126 


+ 


0.985 


12 


-123 


+ 


0.962 


10 


-96 


+ 


0.950 


14 


-41 




0.951 


12 


-33 


+ 


0.957 


8 


-5 




0.975 


8 



TGTCAGTTG 

CCCAACTGAC 

AATAGAATTAG 

AACTAAATTAG 

GCACACCTCAG 

AGATAAATCCA 

CTTCAGTTG 

TTGTAGATAGGACA 

AGATAGGACAT 

CATAACAGATGGTAAG 

CATAACAGATGGTAAG 

CATAACAGATGGTAAG 

ACCATCTGTT 

TCAAGATAAAGTA 

AGTTGGGAATTCC 

AGTTGGGAATTC 

TGGGAATTCC 

TCAGTGATATGGCA 

TAAAACAAAACA 

TTTAGCGC 

TGAGGGGA 



NFYQ6 

MZF101 

CMYB_01 

VMYB_02 

STAT_01 

STAT_01 

MZF1_01 

K2_01 

MZF1_01 

SRY_02 

MZF1_01 

MYOD_Q6 

DELTAEF1_01 

S8_01 

MZF1 01 



-748 
-738 
-684 
-682 
-673 
-673 
-556 
-451 
-424 
-398 
-216 
-190 
-176 
5 

16 



+ 
+ 



+ 

+ 
+ 
+ 



Score Length Sequence 

0.956 11 GGACCAATCAT 

0 962 8 CCTGGGGA 

0.994 9 TGACCGTTG 

0.985 9 TCCAACGGT 

0 968 9 TTCCTGGAA 

0 951 9 TTCCAGGAA 

0.956 S TTGGGGGA 

0 965 12 GAATGGGATTTC 

0.986 S AGAGGGGA 

0 955 12 GAAAACAAAACA 

0.960 S GAAGGGGA 

0.981 10 AGCATCTGCC 

0.958 11 TCCCACCTTCC 

0.992 11 GAGGCAATTAT 

0.986 8 AGAGGGGA 



Promoter sequence P29B6 (555 bp): 

Matrix Position 

ARNT01 -311 

NMYC_01 -309 

USF_01 -309 

USF_01 -309 

NMYC_01 -309 

MYCMAX02 -309 

USFC -307 

USF_C -307 

MZF1_01 -292 

ELK1_02 -105 

CETS1P54_01 -102 

AP1_Q4 -42 

APLFJ_Q2 -42 

PADS C 45 



Location in: 
SEQ ID NO: 17 
17-25 

complement of 18-27 
complement of 75-85 
94-104 

complement of 129-139 

complement of 155-165 

170-178 

176-189 

180-190 

284-299 

284-299 

284-299 

complement of 287-296 

complement of 302-314 

393-405 

393-404 

396-405 

423-436 

complement of 478-489 
486-493 

complement of 5 14-52 1 



Orientation 


Score 


Length 


Sequence 


+ 


0.964 


16 


GGACTCACGTGCTGCT 


-r 


0.965 


12 


ACTCACGTGCTG 


-r 


0.985 


12 


ACTCACGTGCTG 




0.985 


12 


CAGCACGTGAGT 




0.956 


12 


CAGCACGTGAGT 




0.972 


12 


CAGCACGTGAGT 


+ 


0.997 


8 


TCACGTGC 




0.991 


8 


GCACGTGA 




0.968 


8 


CATGGGGA 




0.963 


14 


CTCTCCGGAAGCCT 


+ 


0.974 


10 


TCCGGAAGCC 




0.963 


11 


AGTGACTGAAC 




0.961 


11 


AGTGACTGAAC 


+ 


1.000 


9 


TGTGGTCTC 



Location in: 
SEQ ED NO 
complement 
70-77 
124-132 

complement of 126-134 
135-143 
complement 
complement 
357-368 
384-391 

complement of 410-421 
592-599 
618-627 
632-642 
complement 
complement 



:20 
t of 60-70 



of 135-143 
of 252-259 



of 813-823 
of 824-831 



Location in: 
SEQ ID NO 
191-206 
193-204 
193-204 
complement 
complement 
complement 
195-202 
complement 
complement 
397-410 
400-409 
complement 
complement 
547-555 



23 



of 193-204 
of 193-204 
of 193-204 

of 195-202 
of210-217 



of 460-470 
of 460-470 



FIGURE 5 
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